Skip to content

[iOS][Globalization] Simple IndexOf support with CompareOptions.IgnoreSymbols #118523

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

matouskozak
Copy link
Member

@matouskozak matouskozak commented Aug 8, 2025

Naive implementation of CompareOptions.IgnoreSymbols for IndexOf on iOS. Because ObjectiveC string APIs (https://developer.apple.com/documentation/foundation/nsstring/compareoptions?language=objc) don't provide a direct alternative to IgnoreSymbols option, we use regex to remove the symbols from the strings.

The implementation works in a following:

  1. Preprocess the source and search strings by removing the symbols.
  2. Calculate the IndexOf on the preprocessed strings.
  3. Map the range from preprocessed source string back to the original string to get the index and length.

Also small refactoring of the iOS globalization codebase and enabling more tests.

#111895

Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-globalization
See info in area-owners.md if you want to be subscribed.

@matouskozak
Copy link
Member Author

/azp run runtime-extra-platforms

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@matouskozak
Copy link
Member Author

/azp run runtime-extra-platforms

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copilot

This comment was marked as outdated.

@matouskozak matouskozak requested a review from Copilot August 12, 2025 12:29
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements CompareOptions.IgnoreSymbols support for iOS's hybrid globalization by adding symbol removal functionality using regex patterns. The implementation preprocesses strings to remove symbols, performs comparisons on cleaned strings, and maps results back to original string coordinates for IndexOf operations.

Key Changes:

  • Added regex-based symbol removal functions with position mapping for IndexOf operations
  • Extended all string comparison functions (Compare, IndexOf, StartsWith, EndsWith) to support IgnoreSymbols
  • Enabled previously disabled tests by removing iOS platform exclusions for IgnoreSymbols functionality

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/native/libs/System.Globalization.Native/pal_collation.m Core implementation adding symbol removal functions and IgnoreSymbols support to comparison operations
src/libraries/System.Private.CoreLib/src/System/Globalization/CompareInfo.iOS.cs Updated supported comparison options to include IgnoreSymbols
src/libraries/System.Runtime/tests/System.Globalization.Tests/CompareInfo/CompareInfoTests.Compare.cs Removed iOS platform exclusions for IgnoreSymbols tests
src/libraries/System.Runtime/tests/System.Globalization.Tests/CompareInfo/CompareInfoTests.IndexOf.cs Removed iOS platform exclusions for IgnoreSymbols tests
src/libraries/System.Runtime/tests/System.Globalization.Tests/CompareInfo/CompareInfoTests.IsPrefix.cs Updated test expectations and removed some iOS platform exclusions
src/libraries/System.Runtime/tests/System.Globalization.Tests/CompareInfo/CompareInfoTests.IsSuffix.cs Removed iOS platform exclusions for IgnoreSymbols tests
src/libraries/System.Runtime/tests/System.Globalization.Tests/CompareInfo/CompareInfoTests.LastIndexOf.cs Removed iOS platform exclusions for IgnoreSymbols tests

if (mapping == nil || mapping.count == 0 || modifiedRange.location >= mapping.count)
return invalidRange;

int32_t mappedLocation = [mapping[(NSUInteger)modifiedRange.location] intValue];
Copy link
Preview

Copilot AI Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The calculation assumes a contiguous range in the original string, but when symbols are removed, the characters may not be adjacent in the original string. The mapped length should account for all characters between the start and end positions, including any removed symbols in between.

Copilot uses AI. Check for mistakes.

int32_t mappedLocation = [mapping[(NSUInteger)modifiedRange.location] intValue];

// Calculate the mapped length by finding the end position in the original string
NSUInteger endIndex = modifiedRange.location + modifiedRange.length - 1;
Copy link
Preview

Copilot AI Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The calculation assumes a contiguous range in the original string, but when symbols are removed, the characters may not be adjacent in the original string. The mapped length should account for all characters between the start and end positions, including any removed symbols in between.

Copilot uses AI. Check for mistakes.

if (endIndex >= mapping.count)
return invalidRange;

int32_t mappedEndLocation = [mapping[endIndex] intValue];
Copy link
Preview

Copilot AI Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The calculation assumes a contiguous range in the original string, but when symbols are removed, the characters may not be adjacent in the original string. The mapped length should account for all characters between the start and end positions, including any removed symbols in between.

Suggested change
int32_t mappedEndLocation = [mapping[endIndex] intValue];
int32_t mappedEndLocation = [mapping[endIndex] intValue];
// The mapped length should cover all characters between mappedLocation and mappedEndLocation, including any removed symbols.

Copilot uses AI. Check for mistakes.

if (endIndex >= mapping.count)
return invalidRange;

int32_t mappedEndLocation = [mapping[endIndex] intValue];
Copy link
Preview

Copilot AI Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The calculation assumes a contiguous range in the original string, but when symbols are removed, the characters may not be adjacent in the original string. The mapped length should account for all characters between the start and end positions, including any removed symbols in between.

Suggested change
int32_t mappedEndLocation = [mapping[endIndex] intValue];
int32_t mappedEndLocation = [mapping[endIndex] intValue];
// mappedLength should include all characters between mappedLocation and mappedEndLocation in the original string,
// including any removed symbols in between.

Copilot uses AI. Check for mistakes.

yield return new object[] { s_invariantCompare, "\uD800\uDC00", "\uD800", CompareOptions.None, true, 1 };
yield return new object[] { s_invariantCompare, "\uD800\uDC00", "\uD800", CompareOptions.IgnoreCase, true, 1 };
}
else
{
yield return new object[] { s_hungarianCompare, "dzsdzsfoobar", "ddzsf", CompareOptions.None, false, 0 };
if (PlatformDetection.IsNotHybridGlobalizationOnApplePlatform)
yield return new object[] { s_invariantCompare, "''Tests", "Tests", CompareOptions.IgnoreSymbols, false, 0 };
yield return new object[] { s_invariantCompare, "''Tests", "Tests", CompareOptions.IgnoreSymbols, PlatformDetection.IsHybridGlobalizationOnApplePlatform ? true : false, 0 };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment with the issue link tracking the wrong results we have in this case?

// \p{S} - Symbols (currency, mathematical, modifier, and other symbols)
// \p{Z} - Separators (space, line, and paragraph separators)
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"[\\p{P}\\p{S}\\p{Z}]" options:NSRegularExpressionCaseInsensitive error:&error];

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would make sense to factor this to its own inlined method? I am seeing using more than once

NSString *charString = [NSString stringWithCharacters:&ch length:1];

// Check if this character matches the symbol pattern
NSRange matchRange = [regex rangeOfFirstMatchInString:charString options:0 range:NSMakeRange(0, 1)];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NSRange matchRange = [regex rangeOfFirstMatchInString:charString options:0 range:NSMakeRange(0, 1)];

I am not sure how the performance will look like with that? Can't this be done in the managed code using the CharUnicodeInfo?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants